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Abstract 

We consider distributed optimization over orthogonal collision channels in spatial random access 
networks. Users are spatially distributed and each user is in the interference range of a few other users. 

Each user is allowed to transmit over a subset of the shared channels with a certain attempt probability. 

We study both the non-cooperative and cooperative settings. In the former, the goal of each user is to 
maximize its own rate irrespective of the utilities of other users. In the latter, the goal is to achieve 
proportionally fair rates among users. Simple distributed learning algorithms are developed to solve these 
problems. The efficiencies of the proposed algorithms are demonstrated via both theoretical analysis 
and simulation results. 
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Index Terms —Distributed optimization, collision channel, slotted-ALOHA, Nash equilibrium, 
proportional fairness. 


I. Introduction 

Spectrum scarcity along with the increasing demand for wireless communication have triggered 
the development of efficient spectrum access schemes for wireless networks. A good overview 
of the various dynamic spectrum access models for MAC can be found in [2]. In this paper we 
focus on the open sharing model among users that acts as the basis for managing a spectral 
region (e.g., WiFi, cognitive radio, sensor networks, and unlicensed band technology [2]). 

Consider a spatial wireless network with N users sharing K collision channels. Each user is 
in the interference range of a few (but not necessarily all) other users, referred to as neighbors 
(e.g., when the distance between users is small they cause mutual interference). In the beginning 
of each time slot, each user is allowed to transmit over M channels (1 < M < K) with a certain 
attempt probability (i.e., using the slotted-ALOHA protocol). If two or more neighbors transmit 
simultaneously over the same channel, a collision occurs. In multi-channel systems, exploiting 
the channel diversity plays an important role in designing effective channel allocation protocols. 
The channel conditions are a function of both the inherent quality of each channel due to fading, 
shadowing, etc., as well as the interference caused by the users that use the channel. Thus, it 
is intuitive that users can improve performance by adaptively choosing channels with a higher 
probability of being idle as well as higher capacity when idle. We are interested in finding a 
channel allocation and attempt probabilities in a distributed manner so as to optimize certain 
objectives in the network. 

A. Game theory, Distributed Optimization, and Learning for Spectrum Access Protocols 

Spectrum access protocols can be broadly classified into two classes: (i) protocols in which 
users do not share information with each other, due to security or overhead considerations, and 
(ii) protocols in which information is shared to achieve a common goal, such as in networks 
which are controlled by a single provider. Achieving an effective channel allocation for the 
spectrum access problem in a distributed manner requires users to adaptively adjust their actions 
(i.e., select channels and attempt probabilities) based on local information about the current 
state of the system. Thus, the first question of interest is whether the system keeps oscillating 
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due to frequent ehannel switching, or whether the system converges to a stable operating point. 
When users do not share information, a stable channel allocation may not be a system-wide 
optimal solution (though it reduces the undesirable effects of frequent channel switching and 
also demonstrated good performance in some network models and typical scenarios, as in [3]- 
[5]). Thus, the second question of interest is whether small amounts of information sharing can 
lead to a globally-optimal operating point. 

Game theory provides a rich set of tools to analyze a dynamic behavior of a system when 
entities (or players) in the system take actions to optimize a predefined objective. Thus, using 
game theoretic models to analyze wireless networking protocols and algorithms, in which users 
(i.e., players) adjust their strategies (e.g., attempt probabilities, transmission power, selected 
channels, etc.) so as to optimize the system performance has been attracted much attention in 
recent years. Related work on networking games can be found under a non-cooperative setting 
in [5]-[24] and under a cooperative setting in [9], [11], [13], [14], [17]-[19], [24], [25]. Since 
generally we are interested in networking protocols that require small amounts of information 
sharing (if any) and distributed in nature, it is often desired to develop efficient distributed 
learning and optimization methods to achieve the target solution. 


B. Main Results 

We first examine the case where users do not share information with each other. The achievable 
rate of each user increases with its own attempt probability, when other attempt probabilities 
are fixed. Thus, a natural approach to achieve a good operating point is to allow every user to 
maximize its own rate under a constraint on the allowed attempt probability]] (where different 
attempt probability constraints are used to prioritize different users in the network), referred to 
as distributed rate maximization. Next, we summarize our main results in this respect, (i) In [5], 
the special case of a fully connected network (i.e., all users are in the same interference range) 
and M = 1 (i.e., each user is allowed to transmit over only one channel) was considered, and a 
distributed algorithm was applied to solve the distributed rate maximization problem, in which 
each user updates its strategy using its local channel state information (CSI) and by monitoring 
the load over the channels. It was shown that any finite improvement path (not necessarily best- 

'Similar approaches were applied in [5], [7], [15], all resulting with an individual rate and attempt probability for every user. 
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response) aeross users, in whieh at eaeh iteration the rate of a single user inereases when it 
updates its channel-seleetion strategy given the current system state, reaches an equilibrium in 
the sense that no user can increase its rate by unilaterally changing its strategy. In this paper, 
however, we consider a more general case where each user interferes only with its neighbors, and 
M > 1 (i.e., every user is allowed to transmit over multiple channels). Interestingly, we show that 
cycles may occur under some improvement paths in this general model. To solve this problem, 
we use the theory of best-response (BR) potential games, introduced by Voomeveld in 2000 [26]. 
In BR potential games, cycles may occur under some improvement paths, though no cycles occur 
under a BR dynamics. We prove that the system dynamics can be formulated as a BR potential 
game. This result constitutes an important contribution from a game theoretic perspective as well 
as MAC design perspective, since it generalizes existing results on Nash equilibria (NE) in [5], 
[27], [28] (see a more detailed discussion in Section ITC]) . (ii) Based on our analysis, we then 
propose a distributed BR learning algorithm that solves the distributed rate maximization problem 
and converges to an equilibrium in finite time. The convergence result described above requires a 
coordination mechanism that enables users to update their actions sequentially. We then propose 
a simpler mechanism that guarantees convergence as time increases even without coordination 
among users (thus, users may update actions simultaneously). We further extend our convergence 
result to cases where each user may have a different set of available resources, which captures 
the situation of a hierarchical model as in cognitive radio networks (see Section |nl| for details). 
Thus, these results enable us to design MAC protocols for a wide range of practical system 
models, (iii) Since multiple NEs may exist, we finally analyze the efficiency of the NEs that the 
algorithm may converge to. It should be noted that very little is known about the efficiency of 
the NEs under related models as considered in this paper, particularly when interference across 
users forms a graph structure. A popular performance measure for a NE efficiency is the Price of 
Anarchy (PoA), which is the ratio between the optimal performance and the worst equilibrium. 
The PoA (with respect to the sum utility) has been analyzed in [29] under the special case of a 
fully connected network (i.e., the interference graph is complete) and equal attempt probability 
for all users. In this paper, we analyze performance at equilibrium on average rather than worst 
case performance, which is useful particularly in the context of wireless networks since we 
are generally interested in the expected performance of users in the long run. Specifically, it 
is shown that under some mild conditions (see Section IIII-CI for details), implementing the 
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proposed algorithm on regular conflict graphs guarantees that every user in the system improves 
its performance (in terms of expected rate) as compared to a naive algorithm, in which users 
choose channels for transmission randomly without performing congestion monitoring used to 
adjust their strategies as proposed in this paper. Significant performance gain (more that 170% 
improvement) is obtained under a low collision level. 

Second, we focus on a cooperative setting, in which the goal is to achieve the optimal channel 
allocation and attempt probabilities that attain proportionally fair rates in the network. When 
K = 1 (i.e., a single channel case), users have no freedom to choose among different channels, 
and the action of each user degenerates to setting the optimal attempt probability for transmission 
over the single channel. Low-complexity algorithms have been developed in [30]-[32] under 
various models of a single collision channel. In this paper, however, we address this question 
for multi-channel networks (i.e., K > 1) where every user is allowed to choose a single channel 
for transmission (i.e., M = 1) among the K channels and to set the optimal attempt probability 
for transmission over the channel^ Direct computation of the optimal channel allocation and 
attempt probabilities that attain proportionally fair rates for the multi-channel ALOHA network 
considered in this paper is a combinatorial optimization problem over a graph. Furthermore, it 
requires a centralized solution that uses global information which is impractical in large-scale 
networks. Next, we summarize our main results in this respect, (i) We study the problem from a 
game theoretic perspective and develop a novel cooperative distributed algorithm based on log- 
linear learning, referred to as noisy BR, to achieve the target solution in a distributed manner. 
Specifically, at each iteration, using message exchanges between neighbors only, selected users 
take actions with respect to a cooperative utility that balances between their own utilities and 
the interference level they cause to their neighbors given the current system state. In noisy BR 
dynamics, users play the BR that maximizes their cooperative utilities with high probability, while 
suboptimal responses are taken with small probabilities to escape local maxima. We prove that 
the proposed cooperative algorithm converges to the global proportional fairness solution with 
high probability as time increases. Furthermore, we show that every Nash equilibrium attained 

^Accessing a single-channels is often assumed due to hardware constraints or when it is desired to limit the congestion level 
in high-loaded systems. It has been widely assumed in cognitive radio applications, WiFi, sensor networks, etc. It should be 
noted, however, that developing a tractable optimal solution for the proportional fairness problem under the case where users 
are allowed to access two or more channels at a time remains an open question. 
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by the algorithm can be reached in finite time by playing BR and it is a good operating point 
in the sense that proportionally fair rates are attained locally among all users sharing the same 
channel, (ii) The proposed algorithm significantly simplifies the implementation as compared to 
existing methods. First, it requires less amount of information sharing between nodes. Second, 
synchronization in a neighborhood with respect to action updates is not required (see Section 
iTClfor a more detailed discussion on related works). 

C. Related Work 

Spectrum access and sharing have attracted much attention in past and recent years. We next 
discuss related works that use game theoretic models, distributed optimization, and learning 
techniques, some of them have been discussed in Sections II-Al II-Bi and highlight the main 
differences in the model, analysis and results obtained in this paper as compared to the related 
existing studies. 

ALOHA-based Protocols and Cross Layer Optimization. ALOHA-based protocols have 
been widely used in wireless communication primarily because of their ease of implementation 
and their random nature. Related work on ALOHA-based protocols can be found in [5]-[7], 
[15], [24], [33]-[38] for fully connected networks and in [28], [30]-[32], [39]-[42] for spatial 
networks. Stability of a selfish behavior dynamics in a single-channel ALOHA system was 
studied in [6]. Equilibria under rate demands have been analyzed in [7], [15]. In this paper, 
however, we focus on the multi-channel case. In [5], [28], the multi-channel ALOHA case was 
studied, where M = 1. In [28], the authors have developed a distributed algorithm, in which a 
mixed strategy was applied to obtain local information in a spatially distributed network. In [5], 
a pure strategy was applied, where the local information was obtained by sensing the spectrum in 
a fully connected network. When M = 1, the log-rate of each user under an ALOHA model can 
be expressed as a linear combination of its inherent log-rate minus the log-interference caused 
by its neighbors (i.e., an affine function, see (|3|) for details). As a result, due to the monotonicity 
of the logarithm, analysis of Nash equilibria when M = 1 under the non-cooperative setting 
follows by applying a variation of the ordinal potential function introduced in [27] for affine 
utilities. Thus, any improvement path (not necessarily best-response) across users, in which at 
each iteration the rate of a user increases when it updates its channel-selection strategy given the 
current system state (i.e., sequential updating), reaches an equilibrium in the sense that no user 
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can increase its rate by unilaterally ehanging its strategy. In this paper, however, we consider 
the case where M > 1, in which cycles may oeeur under some improvement paths and the 
dynamics does not obey an ordinal potential funetion. Thus, our Nash equilibria analysis using 
the theory of best-response potential games (as described in Seetion II-BI) generalizes the Nash 
equilibria results obtained in [5], [27], [28]. It also generalizes the equilibria results in [43] (that 
assumes that eaeh node eontributes equally to the eongestion of a resource) due to different 
attempt probabilities aeross users eonsidered here. It should be noted that avoiding simultaneous 
updates aeross users can be done by allowing eaeh user to draw a random baekoff time and 
update its strategy when the baekoff time expires. However, we will show eonvergenee of the 
algorithm even without this mechanism. Stability of multi-channel ALOHA systems was studied 
in [33], [34], [44], [45]. In [39], [41], spatial single-ehannel ALOHA networks have been studied 
under interference ehannels using stochastie geometry. Opportunistie ALOHA sehemes that use 
eross layer MAC/PHY teehniques, in whieh the design of Medium Aeeess Control (MAC) is 
integrated with physical layer channel information to improve the speetral effieiency, have been 
studied under both the single-ehannel [15], [36], [41] and multi-ehannel [5], [24], [36], [37] 
eases. Other related studies considered reeently opportunistie earrier sensing in a eross-layer 
design [4], [46]-[49]. A cross-layer MAC/PHY methodology is used in this paper to design 
effieient distributed algorithms for the problems under study. 

Distributed Learning and Optimization for a Fair Spectrum Sharing. Aehieving pro¬ 
portionally fair rates in spatial random aeeess networks (which considered in this paper under 
the eooperative setting as deseribed in Seetion II-BI) has been studied under the single eollision 
ehannel ease (K = 1) in [30]-[32] and the multi collision channel case (K > 1, M = 1) in 
[42] (as eonsidered in this paper). The algorithm developed in [42] uses a Gibbs sampler over 
loeal maxima that converges to a global maximum as time inereases. The algorithm requires 
information sharing between nodes up to seeond neighborhood at eaeh iteration. It further requires 
perfeet synchronization in a neighborhood with respeet to aetion updates in the sense that onee 
a node updates its strategy all its neighbors must update their strategies accordingly. In this 
paper, however, we develop an algorithm that requires information sharing between a single 
node and its neighbors only (i.e., first neighborhood) at each iteration, and synehronization in a 
neighborhood with respeet to aetion updates is not required. Onee a node updates its strategy, 
its neighbors may or may not update their strategies. Thus, convergence of our scheme is robust 
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against stubborn neighbors, temporary communication link failures, etc. The proposed algorithm 
is based on log-linear learning techniques (see [50], [51] for more details on the theory of log- 
linear learning), and use a game theoretic perspective to analyze the algorithm’s performance. 
Similar idea for using altruistic plus selfish components in the algorithm design under a channel 
and cell selection problem has been identified in [40], where global optimum was obtained via 
Gibbs sampler. The MAC layer protocol between users was assumed given and the question of 
interest is concerned with the interference mitigation between cells which are co-exist in the 
same frequency bands. Furthermore, the objective aimed at minimizing the minimum potential 
delay (and not obtaining proportionally fair rates as considered in this paper). On algorithm 
development, the model in [40] requires each user to computes the aggregate utility of its own 
and all users in the network that communicate with the same AP (via a utility of the form of 
1/f{SNR)). Consequently, the resulting algorithm in [40] is fundamentally different from the 
one developed in this paper under the cooperative setting. Other related studies that use log-linear 
learning and Gibbs sampling techniques under different spectrum access models and objectives 
can be found in [52]-[56]. 

Game Theoretic Models for Communication Systems. Cooperative game theoretic opti¬ 
mization has been studied under frequency flat interference channels in the SISO [11], [13], 
MISO [17], [18] and MIMO cases [14]. The frequency selective interference channels case has 
been studied in [9], [19]. The collision channels case has been studied under a fully-connected 
network and without information sharing between users in [24], where the global optimum was 
attained under the asymptotic regime (i.e., as the number of users N approaches infinity) and the 
i.i.d assumption on the channel quality. In this paper, however, we study distributed optimization 
of the user rates under the cooperative setting for spatial networks where information sharing 
between neighbors is allowed. We show that proportionally fair rates are attained for any number 

> 1 of users without any assumption on the network topology or channel distribution. 

Other related game theoretic models have been used in cellular, OFDM A, and 5G systems 
[57]-[61]. In [57], the authors focused on a power control model, where exact and ordinal poten¬ 
tial game models have been investigated. In [58], a joint uplink/downlink subcarrier allocation 
in OFDMA systems has been investigated via a two-sided stable matching game formulation. 
In [59], the interference mitigation problem in the downlink of multicell networks via base 
station coordination has been studied via a potential game framework. In [60], the authors 
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investigated ehannel utilization via a distributed matching approach. In [61], a distributed power 
self-optimization problem has been studied for the downlink operation of dense femtocell net¬ 
works via a noncooperative exact potential game formulation. This paper, however, considers a 
fundamentally different model, where communication is over collision channels (i.e., interfer¬ 
ences are caused by the MAC layer’s attempt probabilities), and the optimization variables are 
channel allocation and attempt probabilities. From a game theoretic perspective, we show that 
some improvement paths may result in cycles under the noncooperative setting (thus, the game 
dynamics does not obey exact or ordinal potential functions). Instead, we formulate the game 
as a best-response potential game, where it is shown that best-response dynamics converges. 

Spectrum Access as a Graph Coloring Problem. Another set of related works is concerned 
with modeling the spectrum access problem as a graph coloring problem, in which users and 
channels are represented by vertices and colors, respectively. Thus, coloring vertices such that 
two adjacent vertices do not share the same color is equivalent to allocating channels such that 
interference between neighbors is being avoided (see [62]-[65] and references therein for related 
works). However, the problem considered in this paper is different since we mainly focus on 
the case where the number of users is much larger than the number of channels (thus, coloring 
the graph may be infeasible). Furthermore, in our case users may select more than one channel, 
and may prefer some channels over others, as well as optimize their rates with respect to the 
attempt probability. 

The rest of the paper is organized as follows. In Section |n] we describe the network model. 
In Sections HII] and |IV] we consider the noncooperative and cooperative settings, respectively. In 
Section 0 we provide simulation results. Section concludes the paper. 

II. Network Model 

We consider a wireless network consisting of a set A/" = {1, 2,..., A^} of users (or transceiver 
links) and a set of K, = {l,2,...,iT} of shared channels (where typically N > K). We focus 
on a spatial wireless network, where each user is in the interference range of a few (but not 
necessarily all) other users. We assume symmetric interference ranges for all users in the sense 
that user n is in user r’s interference range only if user r is in user n’s interference range for all 
n, r G A/”. We refer to users in the same interference range as neighbors, and define X (Af\n) 
as the set of user n’s neighbors (i.e., the interference range equals the communication range when 
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considering communication between neighbors). We assume that users are backlogged, i.e., all 
N users always have packets to transmit. In the beginning of each time slot, each user (say n) 
is allowed to transmit over M channels (1 < M < i^) with a certain attempt probability (i.e., 
using the slotted-ALOHA protocol). Let /Cm be the set of all M-element subsets of /C (i.e., 
/Cm is the set of all channel-selection strategies that a user can choose). Let cr„ = {kn,Pn) be 
the strategy of user n, where /c„ = G /Cm denotes the set of chosen channels and 

0 < Pn < 1 denotes the attempt probability of user n. Thus, when user n decides to transmit 
(which occurs with probability p„) it uses all the channels in for transmission. We define a 
as the strategy profile for all users, and (T_„ as the strategy profile for all users except user n. 

The topology of the interference model can be represented by an undirected graph G = (AA, E), 
where the set of users are represented by the vertices and the interference relationships between 
users are represented by the set of edges E. An edge {n, r) E E means that users n and r are in 
the same interference range. The set of user n’s neighbors is represented by vertices directly 
connected to vertex n excluding vertex n itself. An illustration is given in Fig. |4] in Section |Vl 
We consider transmissions over orthogonal collision channels. Thus, transmission by user n 
over channel kn,i is successful only if no user r Gin transmits over channel kn,i in the same time- 
slot. However, if user n and at least one more user in X„ transmit simultaneously over channel 
kn,i in the same time slot, a collision occurs. The achievable rate of user n over channel k given 
that a transmission is successful, referred to as collision-free utility, is denoted by Un{k) > 0 
(i.e.. Shannon capacity). We consider long-term rates where Un{k) remain fixed across time slots 
during the running-time of the algorithms (e.g., mean-rate, or slow-fading effect). It should be 
noted that the algorithm dynamics and convergence analysis hold under any network topology 
and when rates (i.e., channel gains) may be different across users and frequencies. However, 
equal channels are required for purposes of analysis in Section IIII-Cl 

Define the success probability of user n on channel k, given the strategy profile of other users, 
as follows: 

Vnik, a_n) - n , (1) 

ieln 

where li{k) = 1 if k G kt and li{k) = 0 otherwise. Hence, the expected rate of user n over 
channel kn,i is given by: 

{kn,i) Pni —n) Pn'^n{kn,i^'^n{kn,i) —n^ ■ 
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Note that the log-rate of user n over channel kn,i is given by 

logr„ {kn,i,Pn, (T-n) = log {Un{kn,i)Pn) “ In{kn,i, CT-n), (3) 

where J„(fc,cr_n) is referred to as the log-interference function and is given by: 

In{k, a_ri) = - \ogVn{k, a_„) = ^ log f —j li{k) . (4) 

ieXn ^ ^ 

Note that In{k, a_n) can be viewed as the log-interference that user n experiences over channel 
k caused by its neighbors that transmit over the same channel. Finally, the expected rate of user 
n is given by: 

M 

(^) ^ ^ Xfi (^kfi i, Pny ^—n) • ( 5 ) 

i=\ 

Throughout the paper, we will develop distributed algorithms to optimize certain objectives in 
the network. Theoretically, convergence analysis often requires users to update their strategies 
in a sequential manner. Avoiding simultaneous updates in communication systems is often done 
by allowing each user to draw a random backoff time and update its strategy when its backoff 
time expires (as discussed in Section ItO) . For simplicity, we will assume a similar mechanism 
here. Specifically, it is assumed that users hold a global clock and may update their strategies 
only at times referred to as updating times. At each updating time, every user draws a 

backoff time from a continuous uniform distribution over the range [0, B] for some i? > 0. A 
user whose backoff time expires may broadcast a pilot signal to its neighbors, indicating that its 
strategy has been updated or start transmitting its data and its neighbors can sense activity. Then, 
all its neighbors keep their strategies fixed until the next updating time. Note that neighbors will 
not update their strategies simultaneously, and the time interval for data transmissions is set to 
be higher than B. At each updating time, we refer to users that update their strategies as active 
users. The set of active users is denoted by Ma (which is time-varying across updating times). 
In Tables H HI] (Step 3) we refer to this mechanism as a selection of active users. It should be 
noted, however, that convergence of the algorithm discussed in Section ITTT-Rl wi11 be shown even 
without this coordination mechanism. 

III. Distributed Rate Maximization: 
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A Non-Cooperative Setting 


In this section we consider the case where every user (say n) maximizes its own rate given 
the current system state under a constraint on its allowed attempt probability, i.e., pn < Pn 
where < 1 (see Section ICBl for motivation of this problem). Since maximizing the rate given 
the current system state results in a transmission with the maximal allowed attempt probability 
Pn, the strategy for user n degenerates to choosing the subset of channels that maximizes its 
own rate under a fixed attempt probability P„. As a result, the strategy played by user n given a 
fixed strategy profile of other users (T_„ is given by an = {k*, Pn), where fc* = solves 

the following distributed rate maximization problemcl: 


fc* = arg max P„ (a) s.t. pn = Pn ■ 


( 6 ) 


Since P„ (a) = PnYl!f=i'^ri{kn,i)vn{kn,i,<^-n) ^nd Pn = Pn in © is a constant independent of 
kn, it suffices to solve: 


M 


/c* = arg max y^Un{kn,i)Vn{kn,i,(T-n) 
kn&K-M ^—' 


(7) 


2 = 1 


( 8 ) 


For every user n let {kl^,kl2.-.K,K] be a permutation of {1,iC} such that: 

'^n{kn^l)Vn{kn^l, (^-n) P '^n{kn^2)'^n{kn^2t ^-n) 

>■■■> Un{klK)Vn{klK, (T-n) ■ 

Following (|7]), the channel-selection strategy that solves ® at each given updating time is given 
by: 


7* f/* 7* 7* 1 


(9) 


Note that in practical systems, user n holds an estimate of Un{k) (from pilot signals for 
instance). On the other hand, complete information about other user strategies is not required. 
Monitoring the channels to obtain Vn{k, a_n) for all k is sufficient to make a decisiorQ Hence, 


^For ease of presentation, we assume continuous random rates Un{k) to guarantee a uniqueness of the maximizer. Otherwise, 
channels with the same rate can be ordered arbitrarily. 

''Note that the number of idle time slots and busy time slots can be used to estimate the success probability. Monitoring the 
channels can be done by the receiver (which can sense the spectrum and send this information to the transmitter). Another way 
is to monitor the null period by the transmitter as in cognitive radio systems. Any attempt to access channel k by one user or 
more results in identifying channel k as busy. 
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for purposes of analysis in this section we assume that every user n estimates Vn{k, cr^n) perfectly 
(i.e., monitors the channels for a sufficient time). In Section |Vl simulation results demonstrate 
strong performance of the proposed algorithm in practical systems under estimation errors. Next, 
we examine a distributed algorithm that uses Un{k), Vn{k,a_n) to solve the distributed rate 
maximization problem. 

A. Best-Response Potential Game Formulation 

The system dynamics can be viewed as a non-cooperative game, in which every user sequen¬ 
tially updates its strategy to increase its rate given the current system state irrespective of other 
users’ rates, referred to as the Distributed Rate Maximization (DRM) game. The strategy fc* that 
solves db]) represents a best-response (BR) strategy since a user chooses A;* that maximizes its rate 
given the current system state. On the other hand, switching from strategy kn to /c^ to increase 
the rate (but not maximizing it) such that Rn{k'^,Pn, ct_„) > Rn{kn,Pn, o'-n) is called a better- 
response. A system is in an equilibrium when users cannot increase their rates by unilaterally 
changing their strategy. 

Definition 1: A Nash Equilibrium Point (NEP) for the DRM game is a strategy profile a* = 
(cr*, crl^), where /c*, G /Cm, Pn' = Pn' for all n' G M, such that 


Rn ( 



( 10 ) 


A game has the finite improvement property (FIP) if every improvement path, in which a 
sequence of better-responses are executed by users sequentially, is finite. Clearly, a game with 
PIP converges to a NEP in finite time under any better-response dynamics. In what follows we 
use the theory of potential games to analyze the convergence of the BR dynamics to a NEP 
under the DRM game. In potential games, the incentive of users to switch strategies can be 
expressed by a global potential function. A NEP for the game is reached at any local maximum 
of the potential function. Next, we define a class of related potential games to the DRM game 
at hand. 

Definition 2 ( [26]): The DRM game is referred to as a best-response potential game if there 
is a best-response potential function 0 : a —)■ M such that for every user n and for every 
a-n = {ki,Pi}i^ri^ where h G /Cm, Pi = Pi, the following holds: 


arg max R^ikn, Pn,o--n) = arg max ([{kn, Pn,(^-n) ■ 



( 11 ) 


DRAFT 


14 


Differing from other classes of potential games (e.g., exact, ordinal) which have the FIP, cycles 
may occur in BR potential games under some improvement paths. Nevertheless, no cycle occurs 
when playing BR dynamics since the potential function increases at any BR. In the DRM game, 
some improvement paths may result in cycles when M > 1, as shown in Appendix IVII-AI 
Nevertheless, the following theorem shows that the DRM game is a best-response potential 
game. 

Theorem 1: The DRM game is a best-response potential game, with the following best-response 
potential function: 



( 12 ) 


2=1 


Proof: The proof is given in Appendix IVII-BI 


Note that a variation of (fT^ was shown to be an ordinal potential function for a game with affine 
utilities in [27] (i.e., any improvement path reaches an equilibrium in finite time). Theorem [H 
however, shows that a best-response dynamics under the DRM game reaches an equilibrium in 
finite time although cycles may occur under some improvement paths. 

Remark 1: It should be noted that when the constraints on the attempt probabilities satisfies: 
Pnik) e {0,Pn} for all k,n, each user selects channels among the set of channels, in which 
Pn{k) = > 0. Thus, it can be verified that Theorem [T] holds under this more general case as 

well. This scenario captures the situation of a hierarchical model (as in cognitive radio networks). 
An example of such attempt probability constraints is depicted in Fig. [H where user 1 (high- 
priority) is allowed to transmit over white spaces and 2.4GHz bands, while user 2 (low-priority) 
is allowed to transmit over 2.4GHz band only. 

B. Best-Response Algorithm for Distributed Rate Maximization 

Following Theorem [H we propose a non-cooperative BR algorithm to solve the constrained 
distributed rate maximization problem in the spatial multi-channel ALOHA networks, dubbed 
BR for Distributed Rate Maximization (BR-DRM) algorithm. We initialize the algorithm by a 
simple solution where every user picks the M channels with the highest collision-free utility 
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Frequency J 


Fig. 1. An illustration of attempt probability constraints in a scenario of a hierarchical model in cognitive radio networks. User 
1 (high-priority) is allowed to transmit over white spaces and 2.4GHz bands, while user 2 (low-priority) is allowed to transmit 
over 2.4GHz band only. 


Un{k). In the learning process step, each user monitors the load on the channels to obtain 
Vn{k, a-n) for all k (see the beginning of Section HII] for more details on the monitoring process). 
Then, at each updating time the selected active users (selected according to the mechanism 
described in Section |nl) update their strategies by selecting the channels according to dH). 
When users cannot increase their rates by unilaterally changing their strategy, an equilibrium 
is obtained. The BR-DRM Algorithm is given in Table HI The set of active users in Step 3 
is determined according to the distributed mechanism described in Sec. HIl In Steps 5 — 7 the 
user selects the channels for transmission based on the estimated load. Users repeat updating 
strategies until their rates converge. During the running time of the algorithm the loads on 
the channels are changed dynamically and affect user decisions across time. Convergence is 
guaranteed following Theorem [H since the best response potential function is upper bounded 
(by 0(cr) < M J2n=i loS ( u^) {un{k))) and any local maxima is a NEP for the game 

(since no user can increase its rate by unilaterally changing its strategy). It should be noted that 
convergence in finite time of BR dynamics in the DRM game is preserved as long as all active 
users are not neighbors (since the log-interference that user n experiences In{k, a-n) is affected 
only by users in I„, thus we assume that no simultaneous updates occur among neighbors) as 
designed by the mechanism that selects the active users described in Section |nl 

Corollary 1: Assume that users update their strategy according to the mechanism described 
in Section HIl Then, the BR-DRM algorithm, given in Table HI converges to a NEP in finite time. 
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Next, we examine the ease where simultaneous updates aeross neighbors may occur. An 
example for this case is when a simpler mechanism is applied where each user (say n) updates its 
strategy with a given probability 0 < < 1, referred to as a probabilistic mechanism. Another 

example is when communication errors between neighbors result in simultaneous updates. In 
such cases, convergence to a NEP is achieved with high probability as time increases. 

Proposition 1: Assume that users update their strategy according to the probabilistic mecha¬ 
nism. Then, the BR-DRM algorithm converges to a NEP with probability 1 as time approaches 
infinity. 

Proof: Eet p^in = min„g„,pmax = niax„ Since the DRM game is a potential game, 
any NEP can be reached in finite time when users update their strategy in a sequential manner 
(i.e., when no simultaneous updates occur) starting from any point. Thus, there exists a finite 
integer U which is the maximal number of updates needed to reach any NEP from any starting 
point. 


Next, consider the updating times tf.u_u+i,te.u-u+ 2 , for £ = 1,2, .... Note that given 

any strategy profile by time there exists a sequence of sequential strategy updates 

across users during the updating times t£.u-u+i,te-u-u+2, ■■■,t£.u such that the system surely 
reaches an equilibrium by time t£.u. Since the probability for each such update is greater than 
Pmin (1 — Pmax)^~^ (i-O-, the dcsficd uscr updates its strategy and all other A^ —1 users’ strategies 
are remain fixed), the probability to reach an equilibrium at time ti.u starting at time ti.u-u+i is 

\N-i] £ = 1^2,.... Similarly, the probability that the system 

u 


greater than 


Pmin (1 P max') 


does not reach a NEP at time t£.u starting at time t£.u-u+i is less than 1- 


sN-l 


Since this bound is independent of the starting point, the probability that the system does not 

.rli 

sN-1 


reach a NEP at time t£.u is less than 
completes the proof. 


1 - 


Pmin (f Pmax) 


U 


Pmin (1 P max) 

It the system 
Thus, letting i ^ oo 


C. Efficiency of the BR-DRM Algorithm 

The convergence analysis provided in Section ITTT-AI implies that the BR-DRM algorithm 
converges to a stable channel allocation. However, this stable operating point may not be a 
system-wide optimal solution. Though simulation results demonstrate good performance of the 
algorithm in terms of achievable user rate, in this section we provide theoretical performance 
guarantee of the performance that can be expected by implementing BR-DRM. We examine the 
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TABLE I 

BR-DRM Algorithm 


1) Initialize 

each user (say n) estimates Un{k) for all k, and 
selects the M channels with the highest Un{k) 

2) repeat (at each updating time) 

3) A/"a updated set of active users 

4) for n e J\fa do 

5) estimate Vn{k,a-n) for all k 

6) kl ^ {fc; 1 , fc; 2 , •••, by © 

7) {kn,Pn) ^ {kl,Pn) 

8) end for 

10) until all rates converge 


performance gain under BR-DRM (i.e., when users apply distributed learning of the dynamic 
load to update their strategies) as compared to a naive algorithm, in which every user chooses 
a channel randomly and does not apply the learning process to update its strategy. For purposes 
of analysis, we consider the case where the network forms a |/|-regular graph, and every user 
experiences equal rates for all channels (when channels are free), i.e., = Un{k) = Un{k') 

for all k, k'. We set = K/ (\I\ + 1) for all n (which captures proportional fairness among 
users as will be discussed in subsequent sections). We focus on the more interesting case where 
\I\ + 1 > K (thus, best response is used to mitigate interference among neighbors) and for the 
ease of presentation assume that {\I\ + 1) /K G Z. 

Theorem 2: Assume that the assumptions presented in this section hold. Let , 

fl^aive ^ achieved by the BR-DRM and naive algorithms, respectively. Then, 

the ratio between the user rate achieved by the BR-DRM algorithm and the user rate achieved 
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by the naive algorithm is given by: 

r>BR—DRM 

rir. 


'DNaive 


> Tj = 


1 - 


K 

|/|+1 


|/| + i 


-1 


1 - 


|/|+1 


(13) 


Proof: To prove the theorem we first lower bound the achievable expected rate of user n 
under BR-DRM. Let Infk) be the set of user n’s neighbors who select channel k. Assume to the 
contrary that BR-DRM has converged and that user n selects channel ki and |/„(/ci)| > \I\/K. 
Since there exists a channel /c 2 with \Inik 2 )\ < \f/K (and thus higher rate can be achieved over 
channel /C 2 ), playing best response by user n cannot be terminated by selecting channel ki which 
contradicts the assumption. Since this argument holds for every user in the system, and the BR- 
DRM converges (in a finite time) by Theorem [H then ln{kf) < \I\/K for all n in equilibrium, 
where fc* is the selected channel by user n at equilibrium. As a result, the achievable rate of 
user n is given by: 


R 


BR-DRM 


> Ur. 


K 


K 


'1+1 


-1 


Vn. 


(14) 


|7| + 1 V 1^1+ 1. 

Next, we compute the expected user rate achieved by the naive algorithm where every user 
chooses a channel randomly without using CSI. Assume that user n transmits over channel k. 
Note that channel k is selected by all other users with a probability 1/K and then every user that 

picks channel k actually transmits over it with a probability K/ (|/| -|- 1). Therefore, the expected 

/ \ 1^1 

rate of user n on channel k is: Rn{k) = Unj^ (1 — ■ -jTj^ j • Since every channel is selected 

with equal probability 1/K, the expected rate of user n achieved by the naive algorithm is given 
by: 

ryNaive 


K 


= Ur 


/ +1 


1 K 

^~k' |/| + 1 


(15) 


Hence, the ratio between the expected user rate achieved by the BR-DRM algorithm and the 
expected user rate achieved by the naive algorithm is given by: 


tdBR—DRM 

^ ■■ . - >r] = 

DNaive — ' 


' 1+1 


K ^ K 

|/|+1 J 


|/|+1 


-1 


Vn. 


( 16 ) 


Remark 2: Note that limj^^^ 

=-i 


Kl + i 


-1 


= 1 and that both numerator and denom¬ 


inator of r] approach e ^ as |/| increases and K is fixed. Thus, it can be verified that rj is 
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bounded by 1 < < e, where r] approaches 1 as |/| approaches infinity and K is fixed, and r] 

approaches e when |/| +1 = K and K approaches infinity. Thus, Theorem [2] provides an insight 
about the performance gain that can be expected by the BR-DRM algorithm. Specifically, under 
the system model considered in this section, implementing BR-DRM guarantees that every user 
in the system improves its performance (in terms of expected rate) as compared to the naive 
algorithm. Significant performance gain is obtained when |/| +1 = iT (i.e., in situations of a low 
collision level). In this case, we have p = {1 — . Thus, the user rate increases by more 

than 100% for K >2 (since the performance gain is greater than p = 2) and more than 170% 
for very large K (since the performance gain is greater than e) by implementing BR-DRM 
as compared to the naive algorithm. 


IV. Achieving Global Proportional Fairness: 


A Cooperative Setting 


Instead of solving a distributed rate maximization as done in the preceding section, here we 
are interested in developing a distributed algorithm that attains proportionally fair rates in the 
network (using information sharing between neighbors only). Cooperation in this section refers 
to a social behavior (by designing a social utility function for each user) that can lead to a 
globally-optimal operating point. Nevertheless, the model is still cast as a non-cooperative game 
in the sense that users act with respect to their own social utility. We consider the case where 
M = 1. Thus, /cn G /C is a natural number denoting a single channel chosen by user n. Formally, 
the problem is to find a strategy profile that maximizes the sum-log rate in the network: 


N 

a* = arg max 



(17) 


The above optimization problem (fTTl) was first formulated in [30] under a variation of the 
ALOHA model considered in this paper for single-channel systems (i.e., K = 1) and equal rates 
for all links. In consistence with the previous section, it is convenient to view each user in the 
network as a player that takes actions with respect to a local utility when solving a discrete 
optimization problem, as suggested in [51]. In what follows we address this problem ([TV]) from 
a game theoretic perspective under the multi-channel case. 
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A. Exact Potential Game Formulation 


In Section HII] we have shown that any NEP of the DRM game is a local maximum of its 
potential function (fT^ . In this section, however, we are interested in finding a global maximum 
of (fTTI) since it attains a global proportional fairness in the network. 

Let Xfi(fc) be the set of user n’s neighbors that transmit over channel k, and let 


Fn{kni Pni ^—n) 

A / 1 \ 

= log (Un(kn)Pn) “ In(kn, O’-n) “ log -- |X„,(/c„)| , 

\^-PnJ 

be the cooperative utility (or fair utility) for user n. Note that the cooperative utility balances 
between individual and social utilities. The term \og{un{kn)Pn) — -^n(^n,o'-n) is the individual 
utility for user n, where log \'^n{k)\ represents the aggregated log-interference that user 

n causes to its neighbors. Throughout this section it is assumed that user n can compute its 
cooperative utility when making decisions (see a discussion on a practical implementation in 
section ITV^ . We refer to this game as ihc, fairness game. 

Next, we show that the fairness game is an exact potential game where (^) ^ 

potential function of the game. 

Definition 3 ( [66]): The fairness game is called an exact potential game if there is an exact 
potential function 0 : a —)■ M such that for every user n and for every a-n = where 

fcj e /C, 0 < Pi < 1, the following holds: 


Fn{o- 
Vein ^ 


- Fn{a^^\a_n) 

= (^-n) - , 

= (/Cn = {k^n\pn"’) , 

fcn ^, fcn ^ e /C , 0 < Pn'* , Pn^ < 1 . 


(19) 


Theorem 3: The fairness game is an exaet potential game, with the following exact potential 
function: 

N 

log Rn (cr) . (20) 


n=l 


Proof: The proof is given in Appendix IVII-CI 
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B. Nash Equilibrium of the fairness game 

Since the fairness game is an exaet potential game with an upper bounded potential funetion 
(by 0(cr) < maxfc log (Mn(fc))), any BR dynamies eonverges to a NEP in the sense that 
users eannot inerease their eooperative utility by unilaterally ehanging their strategies. However, 
any loeal maximum of the potential funetion (l20l) is a NEP of the game. Thus, here we first 
eharaeterize the NEPs’ strueture of the fairness game. In Seetion ITV-CI we will use this result to 
develop an algorithm that aehieves the best NEP in the sense that the global maximum of (l20l) 
is attained. 

Definition 4: A Nash Equilibrium Point (NEP) for the fairness game is a strategy profile 
^*-n)' where fc* G /C, 0 < p* < 1 for all i e AT, sueh that 

Fn «, a*_^) > 

Vn. {finiPri^ 1 bn G Xi , 0 ^ Pn F: 1 • 


Theorem 4: A strategy profile a* = {kn,p’f}n=i F a NEP for the fairness game if kf G /C, 
Pn = o \ i -7 n efif. 

\Fn{k*) \ + 1 

Proof: Eix a strategy profile cr_„ and assume that user n updates its strategy. To prove the 
theorem it suffiees to show that for all n and any (T_„ the following holds: 


< = k*^,pl = 


\Fn{kf)\ + 1 

= arg max Fn{kn,Pn,o--n) 
fc„e/c,o<p„<i 

= arg max [log Un{kn) + log Pn 

-In{kn,(T-n) -logi - ) |X„(fc„)| 

Note that for any kn ^ K. the terms log Un{kn), I nikn, cr-n) are independent of p„. Thus, it 
suffiees to show that for any given kn the following holds: 

\| , . = arg niax logp„ + log (1 - Pn) |X„(/c„)| . 

|X„(fc„)| + l o<p„<i 


The ease where |X„(fc„)| = 0 is straightforward sinee user n does not interfere with other users 
(setting Pn = f maximizes the RHS by defining 0 • logO = 0). Thus, we eonsider the ease where 
\Fn{kn)\ > 1- Note that the funetion logp„ + log (1 — Pn) |X„(/c„)| is strietly eoneave funetion of 
Pn (for 0 < Pn < !)■ Therefore, it has a unique global maximum, differentiating with respeet to 
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Pn and equating to zero yields p* = 


1 


n |X„(fe„)|+l 


which completes the proof. ■ 

Corollary 2: A local maximum of (1201) is attained only if every user n is associated with an 

attempt probability = -. In particular, the strategy profile that attains proportionally 

\^n\kn) 1 + 1 ^ 

fair rates (i.e., the solution to (fTVl) ') must satisfy = 


Theorem 5: Let be a given channel allocation for all users. A strategy profile 


\'^n{kn) I + 1 


for all n. 


a* = {k:,p*^ = 


+ 1 J n=l 


N 


( 22 ) 


is the unique solution to the following optimization problem: 

{Pn}n=l = ^^S max ^OgRn({K,Pn}n=l 

ykeic. 


(23) 


Proof: Let A4 be the set of users that select channel k. The achievable rate of user n E A4 
is given by: 

Rn (a) = Un{k)pn (1 “ Pi) ) 

where X„(fc) is the set of user n’s neighbors that transmit over channel k. Taking log on both 
sides yields: 

log(i?n (cr)) = log{Un{k)) + log(pn) + ^ log (1 - Pi) , 

ieXn{k) 

Wn E Mk ■ 

Let Lfc = 'ffn&Nk log(i?„ (cr)) be the sum log rate on channel k. Hence, 


Lk-Y^ 

n&Mk 


log{Ur,{k)) + log(p„) + log (1 - p* 

i^Xn{k) 


Note that is a strictly concave function of n E A4- Therefore, it has a unique global 
maximum. Differentiating Lk with respect to pn , n E Mk, and equating to zero yields p* = 
for all n E Mk, which completes the proof. ■ 

Combining Theorems |4] and [5] yields: 

Corollary 3: A strategy profile a* = a NEP for the fairness game if {p*}^=i 

solves 
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Corollary [2] follows directly from the NEPs’ structure characterized in Theorem 01 We will use 

the fact that attaining the global maximum of (l20l) implies pn = \, -for all n to design 

\In{kn) \ + 1 

a distributed learning algorithm that converges to the solution of (fTTl) . Corollary [3] sheds a light 
on the operating points of the system. Learning algorithms used to converge to a global optimum 
may spend some time at local maxima of the objective function (i.e., a NEP). Corollary [3] shows 
that the local maxima of the potential function may not be so bad. Specifically, every NEP of 
the fairness game can be viewed as a local proportional fairness in the sense that proportionally 
fair rates are attained among all users that share channel k for all k E K,. 


C. Distributed Cooperative Learning Algorithm 


The optimization problem in (fTTl) is a combinatorial optimization problem over a graph, and 
it requires a centralized solution that uses global information which is impractical in large-scale 
networks. Therefore, we propose a probabilistic approach to solve the problem in a distributed 
manner. We develop a distributed cooperative learning algorithm, dubbed Noisy BR for Eairness 
(NBRE) algorithm, with the goal of solving (fTTl) using limited message exchanges between 
neighbors only. NBRE is a cooperative algorithm in the sense that users make decisions with 
respect to the cooperative utility that balances between their own utilities and the interference 
level they cause to their neighbors. 

Recall that BR dynamics may lead to local maxima of the potential function. Hence, instead 
of playing purely BR, in NBRE users play noisy BR (also known as spatial adaptive play or log- 
linear learning) when updating their strategies [50], [51], [67]. In NBRE, active users construct 
a probability mass function (pmf) over their actions and draw their actions according to this 
distribution. Typically, the BR is played with high probability, while other strategies are played 
with a probability that decays exponentially fast with the myopic utility loss in order to escape 
local maxima. Specifically, the pmf over the available actions is given by: 

PFn{k,p,k-n,p—n) 

FT{{kn,Pn) = {k,p)) = - (24) 


EE 


^l3Fn{k' ,l/r,k-n,p-n) 


k'=l r=l 

for some exploration parameter (3 > 0. Eor the ease of presentation, we assume continuous 
random rates Un{k) to guarantee a uniqueness of the maximizer (otherwise BRs are drawn 
uniformly). Note that when (3 = 0 the pmf assigns equal weights on all strategies, while the 
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TABLE II 

NBRF Algorithm 


1) Initialize 

based on message exchanges between neighbors 
each user (say n) set kn ^ arg max {un{k)} 

k 

and Pn ^ 1/ {\In{kn) \ + I). 

2) repeat (at each updating time) 

3) N'a ^ updated set of active users 

4) for n G A/"a do 

5) draw {kn,Pn) randomly according 
to the distribution given in (l24l) . 

6) send a packet containing {kn,Pn) to 
inform all neighbors X„ 

7) end for 

8) until all rates converge 


probability of playing BR approaches one as —)■ oo (a discussion on the setting of f5 based on 
simulated annealing analysis [68] is provided in the end of this section). The NBRF Algorithm 
is given in Table |II| The set of active users in Step 3 is determined according to the distributed 
mechanism described in Sec. HIl Step 5 requires the active users to construct the pmf given in 
(l24l) based on the computation of F„(/c,p, k-n,P-n) for all /c = 1,..., K,p = 1, 1/2,..., 1/(|X„| + 
1) given in (fTSl) . In Step 6, active users must send complete information about their updated 
strategies to their neighbors such that all users can compute their cooperative utility at each given 
updating time. A similar mechanism as described in Section HI] can be applied, where the pilot 
signal is now replaced by a packet containing complete information about the updated strategy. 
Users may repeat updating strategies until their rates converge or for a predetermined number 
of iterations and then stick their BR (see a discussion in the end of this section). 

The following theorem shows that NBRF attains proportional fairness with an arbitrarily high 
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probability as time increases. 

Theorem 6: Let E* be the strategy profile under NBRF (with a parameter /9) at 

time t and the set of strategy profiles that solves (fTTI) . respectively. For any e > 0 there exists 
/3 > 0 such that 

lim Pr G S*) > 1 - e . (25) 


Proof: The proof is based on the results reported in Section IIV-BI and the fact that a 
noisy best response dynamics following (l24l) in exact potential games converges to a stationary 
distribution of the Markov chain corresponding to the game [67]. By Theorem |3l the fairness 
game with the cooperative utility is an exact potential game with an exact potential function 
0 given in (l20l) . Since NBRF plays noisy BR with respect to the stationary distribution of 
the strategy profile is given by [67]: 


P^^^nbrfw ^ 




(26) 


Next, note that the number of user n’s neighbors that transmit over channel |I„(/c„)|, is 
lower bounded by |X„| for all n. Therefore, following Corollary [2l the strategy profile a* that 
attains the global maximum of (l20l) lies inside the action space played by NBRF. Therefore, for 
every e > 0 we can choose /9 > 0 sufficiently large such that the stationary distribution puts a 
sufficiently high weight on the strategy profile that maximizes (l20l) (i.e., 0 in (l26l)). Thus, (l25]) 
is satisfied as time approaches infinity. ■ 

Following the proof of Theorem the stationary distribution of the homogenous Markov 
chain with a fixed 0 corresponding to the game is given by (l26l) . As a result, as the probability 
of playing BR increases (i.e., by increasing 0) the probability of attaining the global maximum of 
the potential function (l20l) increases with time. Achieving the optimal solution (i.e., letting e —?■ 0 
in (l25]) f requires 0 to approach infinity. However, increasing 0 too fast may push the algorithm 
into a local maximum for a long time (since the probability of not playing BR is too small). 
Next, let 0 = 0(t) be a function of time. The process of increasing 0(f) during the algorithm 
is also known as cooling the system in simulated annealing analysis, where T(f) = l/0(f) 
represents the temperature. Following simulated annealing analysis [68], convergence to the 
optimal solution is attained by increasing 0(f) as 0(f) = log(f)/A, f = 1, 2,... (A is a constant 
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and will be discussed in the sequel). As a result, users explore strategy profiles in the beginning 
of the algorithm and will stick their BR as time approaches infinity. In cases where the optimal 
operating point is not unique, the algorithm may converge to one of the optimal operating 
points. An alternative way is to set a piecewise constant I3{t) over time as suggested in [69]. 
Let |tf| be an increasing time sequence, with = 1, where f3{t) = k is kept fixed for all 

<t < Intuitively speaking, the total time between and should be large enough, 
such that the stationary Markov chain associated with the system under a fixed P{t) = k will 
approach arbitrary close to a steady state (with a stationary distribution given in (l26l)f. Following 
simulated annealing analysis in [69], it suffices to let ~ Note that the piecewise 

constant update has a logarithmic order with time since = k ^ logtf/A. It suffices to 
set the constant A to be greater than the maximal change in the objective function. Since the 
maximal value of the objective function is upper bounded by Alogmax„^fc„ and the 

minimal value is lower bounded by A(log j “ max„ |J„| log2), it suffices to set 

A> N (^log (max„,fc„ Un{kn)) - log Amax^ |4| log2) to achieve convergence. 

It should be noted, however, that simulation results demonstrate fast convergence to the optimal 
solution with much smaller values of A under typical scenarios. 

V. Numerical Examples 

In this section we provide numerical examples to illustrate the performance of the algorithms. 
We simulated the following network: N users were randomly dispersed (uniformly) in a circle 
region with a radius of 10 meters. Each user causes interference to all users in a radius of 5 
meters. Every user can choose one channel for transmission among K channels. We assume 
equal achievable rates Un{k) = 100Mbps for all users on all channels when channels are free 
(i.e., collision-free utility). We performed 1, 000 Monte-Carlo experiments and averaged the per¬ 
formance over experiments. The randomness for each trial over which the average performance 
is plotted comes from the random dynamic nature of the user updates (thus, each experiment 
results in a different update dynamic and might even converge to a different equilibrium point). 

We first consider the distributed rate maximization problem under the non-cooperative setting, 
where each user maximizes its own rate under a constraint on the attempt probability. The 
estimation of Vn is based on a moving window of 100 packets. We first examine a small connected 
network with A = 10 users sharing K = 2 channels, so as the centralized optimal exhaustive 
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search solution (in terms of sum-log rate) can be computed and serve as a benchmark for 
comparison. An illustration of the small network is depicted in Fig. HI In Fig. [2] we present 
the average rate to demonstrate the performance in terms of efficiency under fixed attempt 
probabilities P = 2/3 for all users. Though optimality is not guaranteed under BR-DRM due 
its greedy nature, it can be seen that under this small network model BR-DRM converges to the 
optimal channel allocation (in terms of sum-log rate) in finite time and significantly improves 
performance as compared to a random channel allocation. Next, we examine the case of a large 
network, in which the number of users varies during time to demonstrate the robustness of the 
proposed algorithm. We initialized the network size by iV = 250 users, where N/2 users are 
allowed to transmit with attempt probability P = 0.7 (e.g., primary or high-priority users) and 
N/2 users are allowed to transmit with attempt probability P = 0.3 (e.g., secondary or low- 
priority users). We set the number of channels to iT = 30 (i.e., a channel represents a subsets 
of subcarriers as in OFDMA or allocation to PALs in the context of spectrum sharing) (in this 
case computing the optimal solution is intractable). We first increase the network size by adding 
10 users after 100 iterations. Then, we increase the network size more aggressively by adding 
another 40 users. It can be seen that BR-DRM converges to the equilibrium points very fast, 
and significantly outperforms the naive algorithm for all time instants. Note that we can further 
increase the robustness of the algorithms by allowing users to update their strategies only when 
they improve their rates by more than a predefined value. 



Fig. 2. Average rate as a function of the number of iterations. A wireless network containing 10 users and 2 channels. Each 
user transmits with an attempt probability 2/3. 


Second, we consider the cooperative setting, where the goal is to find a channel allocation 
and attempt probabilities in a distributed manner so as to attain proportionally fair rates among 
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Fig. 3. Average rate as a function of the number of iterations under a time varying network size. N = 250, 260, 300 for 
1 < f < 100, 100 < t < 200, 200 <t< 300, respectively (where t denotes the iteration index). In the top figure, the average 
rate of A/2 users with attempt probability 0.7 is presented. In the bottom figure, the average rate of N/2 users with attempt 
probability 0.3 is presented. 


users. We compare the NBRF algorithm, given in Table HIl with the random channel allocation 
scheme, where the optimal attempt probabilities were set under any random channel allocation 
(i.e., pn = ^/{\In{kn)\ + ^) for all n). In the NBRF algorithm, we set (3 = \ogt (where t = 1, 2,... 
indicates the iteration number) to construct the pmf in Step 6. We first examine a small connected 
network with = 10 users sharing K = 2 channels, so as the centralized optimal exhaustive 
search solution can be computed and serve as a benchmark for comparison. An illustration of 
the small network is depicted in Fig. HI In Fig. [5] we present the average log rate to demonstrate 
the performance in terms of proportional fairness and also the average rate to demonstrate 
the achievable effective rates. It can be seen that NBRF significantly improves performance 
as compared to a random channel allocation (even though the attempt probabilities are optimal 
given any random channel allocation) in terms of both fairness and efficiency. It can be seen that 
NBRF approaches the optimal centralized solution as time increases. This result demonstrates 
the efficiency of the proposed distributed learning algorithm in achieving the global proportional 
fairness in the network. 

Next, we consider a large network, in which the number of users varies during time to 
demonstrate the robustness of the proposed NBRF algorithm. We initialized the network size by 

= 80 users, and set the number of channels to iT = 10 (in this case computing the optimal 
solution is intractable). We first increase the network size by adding 5 users after 200 iterations. 
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Then, we inerease the network size more aggressively by adding another 15 users. It ean be 
seen that NBRF approaches the equilibrium points very fast and significantly outperforms the 
random channel allocation for all time instants. 



Fig. 4. An illustration of a small connected network with 10 users spatially distributed in a circle area of radius 10 meters. The 
users share 2 channels. Each pair of users with distance less than 2 meters (represented by an edge) cause mutual interference 
when transmitting simultaneously over the same channel. 



. - - y r - <r 


- - Optimal proportionally fair rates 
^ NBRF algorithm 
□ Random channel allocation 


200 300 

Number of iterations 



Fig. 5. Average sum-log rate and average rate as a function of the number of iterations. A wireless network containing 10 
users and 2 channels. 


VI. Conclusion 

The distributed optimization problem over multiple collision channels shared by spatially 
distributed users was considered. We examined both the non-cooperative and cooperative set¬ 
tings. Under the non-cooperative setting, we developed a distributed learning algorithm for the 
distributed rate maximization problem, in which each user maximizes its own rate irrespective of 
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Fig. 6. Average sum-log rate and average rate as a function of the number of iterations under a time varying network size. 
N = 80, 85,100 for 1 < f < 200, 200 < f < 400, 400 <t< 600, respectively (where t denotes the iteration index). A wireless 
network containing 80 — 100 users and 10 channels. 


Other user utilities. Convergence was proved using the theory of best-response potential games. 
Under the cooperative setting, we developed a distributed cooperative learning algorithm to 
achieve the global proportional fairness in the networks. While direct computation of the optimal 
solution is impractical in large-scale networks, we showed that the proposed distributed algorithm 
converges to the global optimum with high probability as time increases. Simulation results 
demonstrated strong performance of the algorithms. 

Future research directions are convergence time analysis of the proposed algorithms, analyzing 
their performance under malicious/malfunctioning nodes, and extensions of the NEPs efficiency 
analysis under the non-cooperative setting. 

VII. Appendix 

A. Occurrence of Cycles in the DRM Game Under Better-Response Dynamics 

In Theorem [T| in Section IIV-AI we have shown that the DRM game is a best-response potential 
game for M > 1 (i.e., no cycles occur when a best-response dynamics is implemented). Here, 
we provide an example that shows that cycles may occur when a better-response dynamics is 
implemented for M > 1. Assume N = 2 users, K = A channels, M = 2 and Pi = P 2 = 0.5. 
Consider the following utility matrix: 



Mi(l) ui{2) ui{3) mi(4) 


12 12 


M2(1) ^ 2 ( 2 ) ^ 2 ( 3 ) ^ 2 ( 4 ) 


2 12 1 
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and an initial strategy profile: 


Pi 

Pi 

0 

0 


0.5 

0.5 

0 

0 

0 

P2 

P2 

0 


0 

0.5 

0.5 

0 


(28) 


( 1 ) = 


■’ = 


(29) 


a(2) = 


(30) 


Next, we present a better-response dynamies whieh results in a cycle. Assume updating time ti 
and let user 1 update its strategy by switching from channels 1, 2 (with rate 0.5 • 1 + 0.25 -2 = 1) 
to channels 3,4 (with a higher rate 0.5 • 2 + 0.25 ■ 1 = 1.25): 

0 0 0.5 0.5 

0 0.5 0.5 0 

At updating time t 2 user 2 updates its strategy by switching from channels 2,3 (with rate 

0.5 • 1 + 0.25 ■ 2 = 1) to channels 1,4 (with a higher rate 0.5 ■ 2 + 0.25 • 1 = 1.25): 

0 0 0.5 0.5 

0.5 0 0 0.5 

At updating time user 1 updates its strategy by switching from channels 3,4 (with rate 

0.5 ■ 1 + 0.25 ■ 2 = 1) to channels 1, 2 (with a higher rate 0.5 ■ 2 + 0.25 ■ 1 = 1.25): 

0.5 0.5 0 0 

0.5 0 0 0.5 

At updating time ti user 2 updates its strategy by switching from channels 1,4 (with rate 

0.5 • 1 + 0.25 ■ 2 = 1) to channels 2, 3 (with a higher rate 0.5 ■ 2 + 0.25 • 1 = 1.25): 

0.5 0.5 0 0 

0 0.5 0.5 0 


(3) = 


a'- ' = 


(31) 


(4) = 


a'- ' = 




(32) 


B. Proof of Theorem [7] 

Consider two strategies for user no, CTno = (feio^P^p), auQ = (/cio\P„p), and fix the strategy 
profile for all other users cr_„p. Throughout the proof, the superscript (i) refers to the user 
strategies given that user no plays strategy anl, for z = 1, 2. The term In{k, k', P, (T-n,no) refers 
to the log-interference function dH) when user n chooses channel k, user no plays strategy (/c', P), 
and all other users except users n, no play strategy profile a_n,no- 
Step 1: The Improvement in User nfs Rate: 

Assume that ano is a BR strategy for user no, i.e., 

Rm ) - Rm ’ ^-^0) > 0 (33) 
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(34) 


,M 


(35) 


for all and, such that knd G /Cm- Let ^nd,K^ ^ permutation of {l,...,iC} 

such that: 

Uno{k^S,l)^^oiknlv (^-no) > Uno{k^S,2)^no{k^S,2^ CT_no) 

>■■■> Uno{kddd,K)^m{kddd,K^ ^-no) ■ 

Following d?]), the BR channel-selection knd is given by: 

l( 2) _ r r (2) , (2) , (2) 

'^no — '^^0,2) •••)'^no, 

Next, arrange the entries of and = {^k^dd^i, kddd^ 2 y •••) such that: 

Uno{kddd,l)Vno{kddd,l,(r-no) > Mno(tfo^2)Wno(tfo^,2,f^- 
> • • • > Uno{kddd,M)^no{knd,M^ ^-«o) 

As a result, by the construction we have: 

Uno{kldd,i)Vno{kddd,i,(T-no) > ^no ( ^^-no ) 

V/ = . 


-^o> 


(36) 


(37) 


Next, define 


f(l) A l(1) \ l(2) _ I lUJ lUJ I 
fi-no — f^no \ «-no — 1 «^no,l> ''' >'''no,L f > 


:(i) 


:(i) 


l(2) a l(2) \ l(1) _ J U (^) 

l^no — «-no \ «-rto ” 1 '''no,!i • • •)'''no,L 


:( 2 ) 


:( 2 ) 


(38) 


For example, if user tiq selects channels knd = {1,2,3} and knd = {3,4,5} according to 
strategies cri{^ and crno\ respectively, then knd = {1,2}, Wd = {4,5}, and L = 2. Note that 


knd , knd huve the same cardinality (say L = \ knj \ = \ knd I < M) and denote the differences in 


(i)i 


:(2)i 


the chosen channels under strategies andyCrnd (i-U-, knd knd = 0)- We arrange /^^no l 

such that: 

Uno{kdli)Vno{kdll,(T-no) > ^no (fcno,2)^no (fcno,2 > ) 




:{i) 


:{i) 


> ■■■ > UnoCkdllWoik 


^no,Ly^—no) ) 


(39) 


for i = 1,2. 

By the construction and using the monotonicity of the logarithm, we obtain 


AR ■ I k^^^ k^'^^ a l ^ 


-noj 


log(«n(fcio,i)) - Inikddd^i, (T-r 
- (\0g{Unikldd,i)) - Inikldd,i, t^-no) ) > 0 V/ = 1, ..., L . 

Step 2: The difference in the Potential Function: 


(40) 
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Next, to prove the theorem we need to show that 

^-^o) > 0 

The differenee in the proposed function (fT^ A0 is given by: 

A0 (^aioVS, C^-no) = 0 W?, f^-no) - 0 (c^io\ C^-no) 


TV 




n=l 




1 -P„ 


X 


M 


( logMn(/cS) - 


In{k^Y ^-n) 


2=1 


-J^log 


n=l 


i-P. 


X 


'^-n) 

2^ I logMn«D-^- 


2=1 


(a) 


E 

2 = 1 


E Mtz 


n&InoP^li£K 


1-p, 

( 1 ) 1 ( 2 ) 


X 


1 ^TW ^ ^^i^no,ii kno,ii k^noi 0'-n,no) 

logWn(/c„o,i)-- 




n&Ino-Pr^liekr, 


l-Pr. 


X 


T P rr ) 

(2) Nj '^nQ,iP nQ)<J-n,nQ) 


+ log 


\0gUn{Kk 

1 


l-P. 


\0gUn,{k\k) - 


(2) knik^Q^ij ^-no) 


no , 


( 41 ) 
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1-P„ 


X 




E '°s 


neXnpifci^^Gfc,. 


1 -p. 


X 


T P rr ) 

logM '^oP »o.»’ 


-log 


1 


1-P, 


no 


X 


logM„o(^io,i 


4(^io!o ^-^o) 


E/w- 


2 = 1 


Equality (a) follows since only users in that transmit over channels k^li experience a 
change in their interference level. Thus, it suffiees to show that every term in the summation is 
positive (i.e., f{i) > 0 for all i = 1,..., L). After rearranging terms we have: 

1 


fii) = 


E 


neIno-.k^^li£k„ 

:(i) UP 


l-Pr, 


X 


In{.k\li, Uj -, Pn„ CT_n,no) + log(l " Pno) 


E 




l-Pr, 


1 \ T (k^'^ P (7 ) 

i \ ^n\l7no,ii'^no,i-! ^ noi ^-n,no) 


log 


1-P 


no 


logM„o(^E) “ 


4o(^io,of^-no) 


, / 1 \ ^n(knQ i,kl^Q i, Pno, (^-n,no) 


^^^no'-k^nl^Gkr, 


E >odTT 


n€Xn^:k^^^Gkr, 


l-Pr, 


X 


kn{knli^ knh, ^no, C^-n,no) + log(l - Pno) 


log 


1-P 


no 


logUnoik^nli 


kno(kno,i^ ^-no) 
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where the last equality follows by the faet the user uq eontributes — log(l — Pn^) to the log- 
interferenee when transmitting over a ehannel. Hence, after rearranging terms we have: 

1 \ 


fii) = log 
-log 
+ log 
-log 


1 -p, 


no 


1 -p, 


no 


^no{knQ,iJ ^-no) 


1-P 


no 


1-P 


no 




for all i. Hence, (fTTl) follows. Furthermore, (/)(a) is upper bounded by (/)(a) < M J2n=i 1°S ( 

As a result, 0(cr) in (fT^ is a bounded best-response potential function of the DRM game which 
completes the proof. ■ 


C. Proof of Theorem |2] 

Consider two strategies for user tlq, = {kno = k^^\pno = cr^o = {Ko = k^‘^\Pno = 

and fix the strategy profile for all other users a-no- Throughout the proof, the superscript 
(i), refers to the user strategies given that user riQ plays strategy alii, foi" * = 1)2. The term 
In{kn,k^'^\p^'^\a_n,no) Tcfcrs to thc log-interference function dH) when user n chooses channel 
kn, user no plays strategy and all other users except users n, no play strategy a-n,no- 

The difference in the payoff function AP„q is given by: 

Fno{(y^n\ (^-n) - Pno(c^i^\ C^-n) 

= [log -Ino {k^^\(T-no) 

- [log {Uno{k^^'’)P^^^) - Ino {k^^\(^-no) 

= AFno {a^^\a^‘^\a_no) • 

We prove the theorem for fcTl 7 ^ k^‘^\ The case where follows similarly with minor 

modifications. The difference in the proposed function (l20l) A0 is given by: 


maxfc log {un{ 


DRAFT 
















36 


A0 CrS, (T-no) = 0 f^-no) 

= ^ [logMn(/i;„) + logPn - 4(fc„, 

n^no 

+ logUnoik^^^) + logp^^^ - 4o(/c^^\cr_„J 

- ^ [logM„(/c„)+logp„-4(/c„, 

n^riQ 

+ \ogUno{k^^^) + logp(^) - Ino{k^^\(T-no) 

= - Y1 iog(i-p^^^) 

nGXnQ:kn=k^^^ 

+ Y1 iog(i-p^^^) 

n£Xno:kn=k^^^ 

+ hgUnoik^^^) + logp(^) - Ino{k^‘^\ (^-no) 

- (logMno(fc^^^) + logp^^^ - Inoik^^\ (^-no)) 

= -log(l \lno{k^^^) \ +log(l |X„o(A;(^^)| 

+ logMno(/c^^^) +logp^^^ - 4o(fc^^\cr_„(,) 

- (logM„o(/c(^)) +logp(^) - Ino{k^^\ CT-no)) 

= AFno {(T^^\(T^^\(T-no) ) 

where we used the faets that only users in X„(, that transmit over ehannels k^^'^ and k^^'> experience 
a change in their interference level, and the contributions of user uq to the log-interference 
experienced by its neighbors that transmit over channels k^^^ and are — log(l — and 
— log(l—respectively. Hence, (fTTl) follows. Furthermore, (j){a) is upper bounded as follows: 
0(cr) < maxfc log {un{k)). As a result, 0(cr) in (l20l) is a bounded exact potential function 
of the fairness game which completes the proof. ■ 
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